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This paper presents the theoretical and empirical foundations of the TOEFL Junior ® assessment and its development process. The 
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adolescent learners, who are being introduced to English as a second or foreign language at a much younger age than ever before. 
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This framework document describes the key elements of the TOEFL Junior ® test and its development process. By docu¬ 
menting the design framework, including the construct definition and test characteristics, we demonstrate that our test 
design and development processes meet high professional standards in order to produce a quality assessment. The docu¬ 
ment also serves as a reference point during investigations of validity evidence to support the intended test uses over time. 

The test purposes and intended uses, target population, target language use domains, and test constructs of TOEFL 
Junior are described in this framework. Also included is a description of the overall test structure and scoring system, 
demonstrating how the constructs are operationalized. Finally, we outline research topics to support the interpretive 
argument of the use of the test. 

The generic name, TOEFL Junior, will be used inclusively in this document to refer to the two TOEFL Junior tests — the 
paper-based TOEFL Junior Standard test and the computer-delivered TOEFL Junior Comprehensive test—when the 
discussion applies equally to both tests. However, the specific name will be used when the discussion is only pertinent to 
that test, particularly with relation to the overall test and section structure and the scoring system of each test. The decision 
to develop two different versions of the test was made to reach a wider potential population of test takers and to provide 
stakeholders with the option to select a version that best meets their needs and serves their purposes. For example, whereas 
TOEFL Junior Comprehensive provides more information by measuring all four language skills, including speaking and 
writing (which are not measured in TOEFL Junior Standard), its cost and administration requirements may not make it the 
test of choice for all settings. On the other hand, TOEFL Junior Standard, exclusively consisting of selected-response items 
delivered by paper and pencil, has quicker score turnaround. Therefore, it can be used more flexibly, without requiring 
computers. Further information about how the two versions of the test differ is presented in later sections of this document. 

Background 

Generating a New Assessment 

English proficiency is an increasingly important competence to develop for students worldwide. Mastery of English 
expands access to a range of educational, personal, and professional opportunities. As a result, in many education sys¬ 
tems around the globe, English is a regular part of public school curricula. Whereas some countries introduce English 
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into the curriculum in secondary school, other public systems (and private ones as well) start English instruction at much 
lower grades (e.g., third grade in Korea, first grade in China). English as a foreign language (EFL) instructional programs 
are now attempting more ambitious learning objectives worldwide, with an emphasis on communicative language ability 
(cf. Bailey, Heritage, & Butler, 2014). This educational context increases the need for well-designed, objective measures of 
proficiency in English for young learners. 

TOEFL Junior has been developed to address this need by providing much-needed information on the English language 
proficiency (ELP) attainment of young adolescent EFL learners worldwide. 

As part of the TOEFL family of assessments, TOEFL Junior focuses on English learners’ ability to communicate in an 
academic environment where English is the medium of instruction; that is, the test is intended to measure the commu¬ 
nicative ability students need to participate in English-medium school settings. TOEFL Junior complements the existing 
university-level TOEFL assessments by assessing this proficiency at the middle school level. 

English-medium instructional environments can take a range of forms, including (a) public or private schools in 
English-dominant countries (e.g., the United States, the United Kingdom, Canada, Australia); (b) international schools 
in non-English-dominant countries in which content instruction is delivered in English (e.g., International Baccalaureate 
World Schools); and (c) schools in any country that use either bilingual or content- and language-integrated learning 
approaches in which some content instruction is delivered in English. Although these instructional models are different 
in important respects, each calls for students to use English to learn new information in content areas. We also maintain 
that the traditional distinction between English as a second language (ESL) and EFL is of little importance in the afore¬ 
mentioned instructional models; the most relevant feature of all models is that English is used as an instructional language 
regardless of whether English is the language of communication outside of school. To differing degrees, these models also 
call for the use of English for nonacademic purposes, such as for social interactions, service encounters, and classroom 
management. 

Proficiency for English-medium instructional environments may be aspirational for many EFL learners. For EFL learn¬ 
ers with no specific plans to enter a program of instruction in English, TOEFL Junior will provide objective information 
about how their ELP relates to the standard embodied by the TOEFL Junior assessment. In providing an international 
benchmark for English learning, TOEFL Junior can serve as a general progress measure, providing students, parents, 
teachers, and schools with an objective measure of students’ ELP. 

Educational Significance 

As the need to learn English increases, so does the need for appropriate measures to inform students of their English 
proficiency levels. Yet, relatively few international assessments are available for adolescent EFL students. Given the wide 
range of EFL contexts with varied standards and curricula, an international English proficiency assessment would be 
instrumental in providing some degree of standardized information about learners’ proficiency levels. TOEFL Junior has 
been designed to measure students’ proficiency levels and provide useful information about the stage of English profi¬ 
ciency they have attained. Students and teachers will also be informed of the various aspects of ELP needed to function in 
English-medium school settings. Thus, TOEFL Junior results have the potential to help English learners and their teachers 
set appropriate learning goals for the development of English proficiency. 

Test Purpose and Intended Uses 

TOEFL Junior is a measure of the English language ability of young students whose first language is not English and 
who are in the process of developing the proficiency required to participate in an English-medium instructional envi¬ 
ronment. The test measures language proficiency in situations and tasks representative of English-medium school con¬ 
texts. Though some test tasks assess underlying enabling skills, such as grammatical and lexical knowledge, the main 
emphasis of the test is the measurement of communicative competence, that is, the ability to use language for commu¬ 
nicative purposes. Test scores are intended to be used as indicators of the proficiency levels of students in the target 
population. 

The following have been identified as appropriate intended uses of TOEFL Junior test scores for the target population: 
(a) to determine the ELP levels of students on the basis of their performance on tasks representative of English-medium 
instructional environments at the middle school level, (b) to support decisions regarding placement of students into 
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programs designed to increase their proficiency in academic and social English, and (c) to provide information about 
student progress in developing ELP over time. 


Target Population 

TOEFL Junior is designed for students for whom English is a foreign language and who aspire to participate in English- 
medium instructional environments at the middle school level. Test takers will typically range in age from 11 to 15 years. 
They are both male and female, with a wide variety of nationalities and native languages. Their educational backgrounds 
and real-world experiences will vary, but they are typically expected to have at least 5 full years of educational experience 
at the elementary and/or middle school level. 

Identifying the Test Domains of Language Use 

Identifying the characteristics of target language use (TLU) domains or situations is necessary to support the claim that test 
takers’ performance in test tasks relates to their expected performance in real-life communicative situations. Normally, 
the closer the correspondence between TLU tasks and test tasks, the greater the validity of interpretations about a test 
taker’s language proficiency based on his or her test performance (Bachman & Palmer, 1996, 2010). TLU descriptions 
thus provide useful guidelines for the development of item and task specifications. They can also serve as a basis for 
evaluating the authenticity and appropriateness of test content. 

In the process of designing TOEFL Junior, a design team of Educational Testing Service (ETS) researchers, test devel¬ 
opers, and consultants identified TLU tasks that middle school students are expected to perform in English-medium 
secondary school contexts by analyzing two main sources of data. First, English language standards/curricula and text¬ 
books from Chile, China, France, Korea, and Japan were reviewed along with ELP standards for English learners in US 
middle schools (i.e., California, Colorado, Florida, New York, and Texas state standards and the WIDA consortium stan¬ 
dards). Appendices A-D summarize the results of the curricula and standards reviews for each of the four language skills 
(listening, reading, speaking, and writing). The content for each skill has been categorized into three domains, which 
are discussed later in this section. Second, the existing academic literature on language used in academic contexts was 
reviewed. Research from the two aforementioned sources has identified important real-world tasks at the middle school 
level as well as skills needed to complete those tasks. It has also indicated that TLU tasks in an academic context can be 
categorized into three domains related to the purpose of language use. The three domains identified and considered in 
our test design are (a) social and interpersonal, (b) navigational, and (c) academic. In the following section, a brief sum¬ 
mary of literature that supports our rationale for categorizing the three language use domains is provided. Next, the three 
domains are defined and illustrated with real-life language use examples. 

Literature on the Language Demands of Academic Contexts 

As mentioned earlier, the construct targeted by the TOEFL Junior test is the English ability needed to study in an English- 
medium middle school. Efforts to describe the language that students use in school can be traced back to Cummins’s 
(1980,1981) seminal work. Cummins differentiated social language ability, labeled as basic interpersonal communication 
skills (BICS), from more cognitively demanding, decontextualized language ability, which he labeled as cognitive academic 
language proficiency (CALP). Even though there have been critiques of the legitimacy of viewing language use (i.e., CALP) 
as decontextualized, the BICS - CALP categorization has had a significant influence on how we understand the language 
demands that students face in English-medium instructional environments. More importantly, Cummins’s categories 
have spawned research that has sought evidence that academic language proficiency is distinguishable from the language 
proficiency needed for social and interpersonal purposes. In turn, this research has led to the definition and identification 
of the characteristics of academic language proficiency. 

The research findings support the conclusion that the general language proficiency tests do not necessarily capture 
language skills needed for academic study. First, students do not necessarily perform equally well on (a) standardized 
content assessments (e.g., math, science, and social studies) given in English and (b) English language development (ELD) 
assessments mandated for all English learners attending US schools (Butler & Castellon-Wellington, 2005; Stevens, Butler, 
& Castellon-Wellington, 2000). Second, the language measured in ELD assessments does not adequately represent the 
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language used in standardized content assessments. In other words, existing ELD assessments, which many US states 
have used for identifying, classifying, and reclassifying English learners, have been found to be limited with respect to 
measuring the range of language ability required to take content assessments (Butler, Stevens, & Castellon-Wellington, 
2007). Third, the language assessed in ELD assessments does not always accurately represent the language actually used in 
classes (Schleppegrell, 2001). These findings indicate that many widely used ELD assessments do not accurately measure 
the language ability required for students’ participation in English-medium academic settings. If these findings support a 
conceptualization of academic language proficiency as distinct but related to general language proficiency, then the next 
question is how to characterize this ability. 

Chamot and O’Malley (1994) defined academic English as “the language that is used by teachers and students for the 
purpose of acquiring new knowledge and skills . . . imparting new information, describing abstract ideas, and devel¬ 
oping students’ conceptual understanding” (p. 40). Although this definition provides a general concept of academic 
English, other researchers have explored more specific characteristics and expanded the definition of academic English. 
For instance, Schleppegrell (2001) identified specific linguistic features that are encountered in school-based texts (e.g., 
nominalizations, technical lexical choices). Scarcella (2003) further listed various features of academic English from dis¬ 
crete linguistic features (phonological, lexical, and grammatical features) and language functions (sociolinguistic features) 
to stylistic register (discourse features). In doing so, Scarcella attempted to establish a competence-based framework of 
academic English proficiency drawn from prior communicative competence research (e.g., Bachman, 1990; Bachman & 
Palmer, 1996; Canale & Swain, 1980). A fully comprehensive characterization of academic English, however, remains to 
be developed. Nonetheless, the evidence collected thus far shows that the difference between language used for general 
purposes and that used for academic purposes is relative, at both the linguistic and cognitive levels, with complex sen¬ 
tence structures and specialized vocabulary being used relatively more frequently in academic language (Bailey, 2007; 
Cummins, 2000). 

However, it should be noted that the aforementioned literature on academic language proficiency does not undermine 
the importance of English for social and interpersonal purposes. Social language remains an important, foundational 
element of the language proficiency needed in school settings. Therefore, the TOEFL Junior test aims to measure the 
full range of language uses that students encounter in English-medium school settings. In other words, TOEFL Junior 
acknowledges the complex and multifaceted nature of the language that students need to learn in school contexts. 

As noted previously, three domains of language use are identified and considered in the T OEFL Junior test design: social 
and interpersonal, navigational, and academic. These domains are based on Bailey and colleagues’ extensive research on 
school language (Bailey, 2007; Bailey, Butler, LaFramenta, & Ong, 2004; Bailey & Heritage, 2008). In particular, Bailey 
and Heritage’s (2008) tripartite categorization of school language has been found to be most consistent with what the test 
design team identified from its review of standards and curricula. Bailey and Heritage further divided academic English, 
which corresponds to CALP in Cummins’s (1980,1981) bipartite categorization, into school navigational language (SNL) 
and curriculum content language (CCL). They defined SNL as the language needed for classroom management, whereas 
CCL was defined as “the language used in the process of teaching and learning content material” (p. 15). SNL and CCL 
correspond to the navigational and the academic domain, respectively, in TOEFL Junior. More detailed discussion of how 
the three domains are defined and operationalized in the test is provided in the next section. 

Target Language Use (TLU) Domains for TOEFL Junior 

The TLU domain for TOEFL Junior (i.e., English-medium middle school environments) is divided into three 
subdomains — social and interpersonal, navigational, and academic—based on the rationales discussed in the pre¬ 
vious section. It should be acknowledged that these three domains are fluid and cannot be clearly differentiated in all 
language use situations; the distinctions among the three domains can oversimplify the very complex process of language 
use. Note that in Figure 1, the lines representing the subdomains are dotted to symbolize the fuzzy boundaries among 
the domains. In addition, there is an overlap with respect to the characteristics of language required in each of the three 
domains. For example, there is likely to be a threshold level of grammatical knowledge that is fundamental for language 
use irrespective of the specific language use domain. However, despite its imperfections, we believe that this classification 
is effective for describing the wide range of language use activities in secondary-level English-medium school settings. 
Besides differing with regard to the functions or purposes of language activities, the three domains also differ in terms of 
the characteristics of language (e.g., word choice, complexity of sentence structures), which are discussed in more detail 


4 


TOEFL Junior Research Report No. 02 and ETS Research Report No. RR-15-13. © 2015 Educational Testing Service 


Y. So ef al. 


TOEFL Junior® Design Framework 



Figure 1 Defining the target language use (TLU) domain of the TOEFL Junior test. 

in the section on construct definition. Finally, the academic subdomain is believed to play a more significant role than the 
other two domains in students’ success in academic settings, and that is why the area representing the academic domain in 
Figure 1 is larger than those representing the other two domains. This interpretation is also reflected in the test blueprint, 
with more items tapping the academic domain than the other domains. Emphasizing the academic domain in the test is 
also believed to have a beneficial influence on test takers, motivating them to focus their language study on the areas that 
have been found in the academic English literature to be more difficult to master (Bailey, 2007; Cummins, 2000). 

The three TLU subdomains are defined as follows. 

Communicating in English for Social and Interpersonal Purposes 

This subdomain encompasses uses of language for establishing and maintaining personal relationships. For example, stu¬ 
dents participate in casual conversations with their friends in school settings where they have to both understand other 
speaker(s) and respond appropriately. Students sometimes exchange personal correspondence with friends or teachers. 
The topics may include familiar ones, such as family, routine daily activities, and personal experiences. The tasks in this 
domain tend to involve informal registers of language use. 

Communicating in English for Navigational Purposes 

In school contexts, students communicate with peers, teachers, and other school staff about school- and course- 
related materials and activities but not about academic content. For example, students communicate about homework 
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assignments to obtain and/or clarify details. In some cases, they need to extract key information from school-related 
announcements. That is, students need to communicate to navigate school or course information. The second subdomain 
captures this specific purpose of communication. 

Communicating in English for Academic Purposes 

This subdomain entails language activities performed to learn academic content in English. Language functions such 
as summarizing, describing, analyzing, and evaluating are typically needed to learn academic content. The topics may be 
discipline related, including science, math, and social studies. Examples of this language use include comprehending ideas 
in lectures or class discussions, participating in short conversations about academic content in a class, comprehending 
written academic texts, and summarizing oral or written academic texts. Language used for this purpose typically involves 
more formal and technical registers with increased syntactic complexity. 

Construct Definition 

A Model of Language Knowledge 

As discussed in the previous section, TOEFL Junior measures how successfully a test taker can complete test tasks that 
are designed to represent the range of communicative tasks encountered in English-medium middle schools. Among 
the many factors that may contribute to a test taker’s success (e.g., cognitive ability, background knowledge, strategic 
competence), language ability—the target construct of the TOEFL Junior test—should be the main factor influencing 
successful test task completion. 

As a framework for conceptualizing language ability, Bachman and Palmer’s (2010) model of language knowledge 
provides the test design team with a useful framework of reference for designing individual test tasks and the test’s organi¬ 
zation. In particular, the breadth of the model makes it possible to (a) recognize the complex nature of the target construct, 
(b) identify specific component(s) of language knowledge that test tasks are designed to measure, (c) describe the specific 
features of reading/listening passages, and (d) specify the expected characteristics of the test takers’ responses to speaking 
and writing test tasks. 

As shown in Figure 2, Bachman and Palmer’s (2010) model of language knowledge consists of two broad categories: 
organizational knowledge and pragmatic knowledge. Organizational knowledge refers to knowledge about the formal struc¬ 
ture of a language; it is further divided into grammatical knowledge, which is needed to interpret and produce individual 



- Knowledge of 
vocabulary 

- Knowledge of 
syntax 

- Knowledge of 
phonology/ 
graphology 


- Knowledge of 
cohesion 

- Knowledge of 
rhetorical or 
conversational 
organization 


- Knowledge of 
ideational functions 

- Knowledge of 
manipulative functions 

- Knowledge of heuristic 
functions 

- Knowledge of 
imaginative functions 


- Knowledge of genres 

- Knowledge of 
dialects/varieties 

- Knowledge of registers 

- Knowledge of natural or 
idiomatic expressions 

- Knowledge of cultural 
references and figures of 
speech 


Figure 2 Bachman and Palmers (2010) model of language knowledge. Adapted from Language Assessment in Practice: Developing 
Language Assessments and Justifying Their Use in the Real World , by L. Bachman and A. Palmer, 2010, p. 45. Copyright 2010 by Oxford 
University Press. 
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sentences, and textual knowledge, which is needed to interpret and produce cohesive longer discourse. The second cate¬ 
gory, pragmatic knowledge, is the knowledge needed for a language user to produce and/or process language appropriately 
in relation to other variables such as the language users’ intentions and situational factors. This category is further divided 
into functional and sociolinguistic knowledge. 

It should be pointed out that not all of the areas of language knowledge in Figure 2 are considered appropriate or 
equally important for inclusion and measurement for the TOEFL Junior intended population. For example, knowledge 
of cultural references is inappropriate because it can be a source of between-group test bias. In addition, some areas of 
language knowledge form a fundamental basis for language users to perform certain tasks using language, whereas other 
areas require a certain level of mastery of the first type of knowledge to be appropriately used in context. The knowledge 
of words and sentence structures of a language (i.e., grammatical knowledge in the Bachman & Palmer, 2010, model) is 
an example of the former type of knowledge, whereas the ability to participate in a conversation appropriately by under¬ 
standing the context-appropriate meaning of an utterance and responding to it appropriately (i.e., functional knowledge 
in the Bachman & Palmer, 2010, model) is an example of the latter type of knowledge, which requires a foundation in the 
former. In designing the TOEFL Junior test, the former type of language knowledge is categorized as enabling skills and 
is considered to be fundamental to any communicative language use. Therefore, except in the TOEFL Junior Standard 
Language Form and Meaning section (to be discussed later), enabling skills were considered in defining the language 
demands of communication tasks that students are likely to perform in TLU situations. 

An example presented in the next section illustrates how this language knowledge model has informed the design of 
test tasks, and in particular, how it has helped to maximize their comparability to actual TLU tasks. 

Linking Test Tasks to Target Language Use (TLU) Tasks 

Upon reviewing TLU tasks that were identified through the curricula and standards review (see Appendices A-D), a set 
of TLU tasks was sampled to serve as the basis for the design of test tasks. Each section of the test was developed with 
tasks that, collectively, would provide evidence about a test taker’s competence in communicating in English in all three 
of the TLU subdomains defined in the previous section. In operationalizing each potential task, efforts were made to 
ensure that the linguistic characteristics of each task stimulus (e.g., a listening passage) and its expected response (e.g., a 
spoken constructed-response) were as similar as possible to the language knowledge required to perform a similar task 
in a nonassessment situation in an English-medium middle school context, as represented in Figure 3. 

The example in Figure 3 demonstrates how Bachman and Palmer’s (2010) framework of language task characteris¬ 
tics guided test design. As illustrated in Figure 3, efforts were made to reproduce both the situational and the linguistic 
characteristics of the TLU tasks in the test tasks to the highest possible extent. In particular, in describing the linguistic 
characteristics of the input and expected responses, the test development team used the language model discussed in the 
previous section (Figure 2). 

Organization of the Test Into Sections 

A discussion of the organizational structures of the tests is provided in this section, with individual presentations of the 
two TOEFL Junior tests and the sections included in each. As summarized in Tables 1 and 2, two sections, listening and 
reading, appear in both tests, whereas other sections appear in only one of the tests. The language form and meaning 
section is only present in TOEFL Junior Standard, whereas the speaking and writing sections are included only in TOEFL 
Junior Comprehensive. 

The decision to organize the test by modality (i.e., reading, listening, speaking, and writing) was made mainly because 
most curricula and textbooks currently in use are organized in this manner (Appendices A-D). It is expected, therefore, 
that stakeholders will find it useful to receive information about each modality. However, the design team also acknowl¬ 
edged that, in real life, multiple language modalities are often required to complete a single language use task. Hence, 
integrated tasks, which require multiple modalities (e.g., listening and speaking), are also included in the speaking and 
writing sections of the TOEFL Junior Comprehensive test. In addition, the decision was made to include the language 
form and meaning section in the TOEFL Junior Standard test to indirectly measure students’ ability to use their knowl¬ 
edge of English grammar and vocabulary in speaking and writing, as these abilities cannot be easily operationalized in a 
constructed-response format on a paper-delivered test. 
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TLU task characteristics 


Test task characteristics 


Follow and recount basic/routine oral 
instructions, procedures, or assignments 


S r ' 

Situational characteristics 

Setting: classroom, library, field trip 
location, administrator's officer, etc. 

Participant: student and 
teacher/administrator/peer 

Content: school trip, homework, school 
announcement, sports practice/game, school 
club activity, etc 


- Nonacdemic Listen-Speak 


^ Situational characteristics 

Setting: imaginary setting with contextual 
information provided 

Participant: test taker (speaking to an 
imaginary friend/teacher/parent) 

Content: school trip, homework, school 
announcement, sports practice/game, 
school club activity, etc. 


Linguistic characteristics of input 

Grammatical knowledge: Knowledge about 
general academic language (less formal than 
content-specific academic language, but 
more formal than everyday language) 

Textual knowledge: Knowledge about the 
conventions for marking inter-sentential 
relationships and for organizing units of 
information into a coherent text; mostly 
monologic with sporadic interruptions 

Functional knowledge: Knowledge of using 
language for ideational functions 

Sociolinguistic knowledge: Register in the 
middle of the formality continuum 

Linguistic characteristics of expected output 

Grammatical knowledge: Knowledge about 
general academic language 

Textual knowledge: a monologue or dialogue 
depending on whether the discourse triggers 
follow-up questions 

Functional knowledge: Knowledge of using 
language for ideational functions in order to 
deliver information 

Sociolinguistic knowledge: Register in the 
middle of the formality continuum 

V J 


Linguistic characteristics of input 

Grammatical knowledge: Knowledge 
about general academic language (less 
formal than content-specific academic 
language, but more formal than everyday 
language) 

Textual knowledge: Knowledge about the 
conventions for marking inter-sentential 
relationships and for organizing units of 
information into a coherent text; mostly 
monologic with sporadic interruptions 

Functional knowledge: Knowledge of 
using language for ideational functions 

Sociolinguistic knowledge: Register in the 
middle of the formality continuum 

Linguistic characteristics of expected output 

Grammatical knowledge: Knowledge 
about general academic language 

Textual knowledge: a monologue 


Functional knowledge: Knowledge of 
using language for ideational functions in 
order to deliver information 

Sociolinguistic knowledge: Register in the 
middle of the formality continuum 

V J 


Figure 3 An example of linking a target language use (TLU) task to an assessment task. 


Table 1 Overall Structure of TOEFL Junior Standard 


Section 

Operational 2 

No. of items 

Variable 2 

Total 

Testing time 

Listening comprehension 

30 

12 

42 

40 min 

Language form and meaning 

30 

12 

42 

25 min 

Reading comprehension 

30 

12 

42 

50 min 

Total 

96 

36 

126 

1 h 55 min 


“The operational items are those that are considered for the official score reports of the test, whereas the variable items are those included 
in the test for trial purposes. In other words, students’ responses to the variable items are reviewed to ensure that they can be used as 
operational items in the future. 


Construct Definition by Section 

This section presents detailed information about the definitions of the constructs for each of the test sections. The section 
is arranged in the order of language form and meaning, listening, reading, speaking, and writing, so that the first three 
sections are the ones included in the TOEFL Junior Standard test and the latter four sections (i.e., listening through 
writing) are the ones in the TOEFL Junior Comprehensive test. More information about how each of the two TOEFL 
Junior tests was operationalized is provided in the next section. 
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Table 2 Overall Structure of TOEFL Junior Comprehensive 




No. of items/tasks 


Testing time 

Operational 

Variable 

Total 

Listening 

28 

8 

36 

35 min 

Reading 

28 

8 

36 

40 min 

Speaking 

4 

n/a 

4 

25 min a 

Writing 

4 

n/a 

4 

40 min a 

Total 

64 

16 

80 

140 min 


a The testing time includes both administration time, which allows test takers to process the stimulus input and prepare for their 
responses, and response time, when test takers produce their responses. 


Language Form and Meaning 

This section, included in the TOEFL Junior Standard only, is differentiated from other sections in the TOEFL Junior test 
in that test items in the section aim to measure enabling skills required for communication, whereas items and tasks in 
the other sections measure the ability to apply such enabling skills in actual communicative tasks. Specifically, the items 
in this section assess the degree to which students can identify the structure of English and choose appropriate lexical 
units. The items are presented as gap-filling questions within the context of a cohesive paragraph. Therefore, students are 
required to take into account the context of an entire passage to answer the questions appropriately in the sections. 

It should also be noted that this section intends to indirectly measure students’ ability to use their grammar and vocab¬ 
ulary knowledge for communication in a test where the productive skills (i.e., speaking and writing) are not directly 
measured. In other words, the ability measured in this section has an association, at least to some extent, with students’ 
ability to apply such knowledge of English grammar and vocabulary to speaking and writing tasks. 

The items are divided into two categories: items targeting language meaning and items targeting language form. As 
explained in the following, vocabulary and grammar knowledge was measured in the context of a single paragraph, with 
the justification that the model of language knowledge (see A Model of Language Knowledge and Figure 2) can be better 
operationalized in a rich context than through decontextualized, individual sentences: 

1. The ability to identify an appropriate lexical item within context. Students must be able to identify a word that seman¬ 
tically completes a sentence within the context of a paragraph. 

2. The ability to recognize a proper grammatical structure within context. Students must be able to identify a proper 
structure needed to complete a grammatically accurate sentence in English. 

Listening 

TOEFL Junior assesses the degree to which students have the listening skills required to function in English-medium 
instructional environments. In such contexts, students are exposed to a wide range of aural input, for example, from 
personal conversations to lectures on academic content. Features specific to spoken discourse that distinguish it from 
written discourse include repetition, relatively complex verb structures, relatively little nominalization, and occasional 
performance disfluencies. Therefore, it is essential for successful participation in school that students gain familiarity with 
spoken discourse features and attain listening proficiency sufficient to comprehend different genres of spoken discourse. 
Moreover, to succeed in school, students need to understand the main ideas and important details, make inferences based 
on what is implied but not explicitly stated, make predictions based on what the speaker says, understand a speaker’s 
purpose, and correctly interpret such features of prosody as intonation and contrastive stress. Three types of listening 
ability were defined to capture these skills and language features: 

1. The ability to listen for social and interpersonal purposes. Students must be able to comprehend conversations on 
familiar topics about day-to-day matters that take place in a school setting, such as sharing experiences with their 
peers. 

2. The ability to listen for navigational purposes. Students must be able to comprehend the language that teachers and 
other school staff produce for a range of purposes other than presenting academic content. This includes language 
that takes place both inside and outside of the classroom (e.g., in the school library or auditorium or on field trips) 
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and that fulfills a range of speech functions (e.g., providing directions, making announcements, giving reminders, 
issuing invitations, giving warnings). 

3. The ability to listen for academic purposes. Students need to comprehend ideas presented in a lecture or discus¬ 
sion based on academic material. Though TOEFL Junior requires students to comprehend oral input such as that 
needed to learn new ideas in an English-medium classroom, it does not require subject-specific background knowl¬ 
edge in any given content area. In the domain of science, for example, such speech includes key terms, structures, 
and concepts that enable middle school students to access academic content (terms such as evidence and investi¬ 
gation, concepts such as making observations, reports on the results of an experiment, and a range of structures 
for expressing these concepts), but does not include specific content or concepts that would be taught as part of a 
specific science curriculum (e.g., photosynthesis or geotropism). However, it is construct-relevant to include such 
concepts in the assessment if they are presented, explained, and reinforced so that a proficient listener can learn 
their meanings from the academic speech contained in the stimulus. 

Reading 

TOEFL Junior assesses the degree to which students have mastered the reading skills required for English-medium instruc¬ 
tional environments. The review of English language curricula and language objectives in reading (Appendix B) indicates 
that a wide range of reading subskills are expected of students, including understanding main ideas, identifying important 
details, and making inferences. In addition, the curricula and standards specify different types of text. A relationship was 
observed between text types and the three TLU subskills. Therefore, the three reading abilities to be measured in TOEFL 
Junior are defined as follows, according to text type: 

1. The ability to read and comprehend texts for social and interpersonal purposes. Students should be able to read and 
comprehend written texts on familiar topics in order to establish or maintain social relationships. Text types for this 
purpose may include correspondence (e.g., e-mail, letters) and student writing. In addition, reading for personal 
pleasure (e.g., novels, periodicals) is included in this category. 

2. The ability to read and comprehend texts for navigational purposes. Students need to be able to read and comprehend 
texts in order to identify key information from informational texts for future reference. Such texts include those 
containing school-related information, usually in less linear formats (e.g., directions, schedules, written announce¬ 
ments, brochures, and advertisements). Reading subskills that are particularly relevant to this type of reading include 
comprehending explicit meaning, identifying key information, and understanding steps and procedures. 

3. The ability to read and comprehend academic texts. Students need to be able to read and comprehend academic 
texts in a range of genres (e.g., expository, biographical, persuasive, literary) across a range of subject areas (e.g., 
arts/humanities, science, social studies). They need to be able to read such texts at difficulty levels up to and including 
those typical of what is used in English-medium classrooms. In reading these texts, students need to be able to 
understand the main ideas and the key supporting information, to make inferences based on what is implied but not 
explicitly stated, and to understand key vocabulary (either from previous knowledge or from context) and cohesive 
elements within the text (i.e., referential relationships across sentences). Depending on the nature of the specific 
text, students may also need to understand an author’s purpose, follow the logic and the intended meaning of basic 
rhetorical structures, and/or identify and understand figurative language. As with listening, reading texts will not 
require any specific background or prior knowledge but will sometimes require students to read in order to learn 
new information in an academic context. 

Speaking 

TOEFL Junior Comprehensive assesses the degree to which students have the speaking skills required by English-medium 
instructional environments. This includes three abilities: 

1. The ability to use spoken English for social and interpersonal purposes. Students must be able to communicate orally 
in routine tasks and situations encountered in the school environment. For example, this includes the ability to 
communicate personal information, needs, and opinions on a wide range of familiar topics (such as hobbies, food, 
weather, and extracurricular events). 
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2. The ability to use spoken English for navigational purposes to exchange classroom-related information. Students must 
be able to engage in discussions and interactions on topics related to learning activities. This includes the ability to 
make requests, ask for assistance or information, participate in group activities, and convey simple directions and 
instructions. 

3. The ability to use spoken English for academic purposes to communicate about and demonstrate knowledge of academic 
course content. Students must be able to participate in classroom activities to convey academic knowledge. This 
includes the ability to respond to oral questions about academic content and to convey information heard or read 
in an academic context. 

Writing 

TOEFL Junior Comprehensive assesses the degree to which test takers have the writing abilities required by English- 
medium instructional environments at the middle school level. This includes three types of ability: 

1. The ability to write in English for social and interpersonal purposes. In English-medium instructional environments, 
students must be able to engage in written communication for the purposes of establishing and maintaining social 
and interpersonal relationships. This includes the ability to write effective informal correspondence to peers or 
teachers and the ability to recount events based on personal experience and observation. 

2. The ability to write in English for navigational purposes. In school settings, students must be able to extract key school- 
related information from a variety of spoken or written stimuli and keep written records for future reference. For 
instance, students may need to take notes while listening to their teacher explain a class assignment or the steps of 
a science experiment. Students may also need to write simple, short summaries of school-related information (e.g., 
a field trip, announcements, directions, or procedures). 

3. The ability to write in English for academic purposes. In English-medium instructional environments, students must 
be able to communicate in writing using appropriate written language on subject matters representing a range of 
content areas and genres. This includes the ability to produce connected text; to describe a process in an academic 
context; to understand and be able to summarize, synthesize, and paraphrase important and relevant informa¬ 
tion from spoken and written stimuli; and to integrate information from multiple academic spoken and/or written 
stimuli. 


Operationalizing the Construct 

In this section, the overall structures of TOEFL Junior Standard and TOEFL Junior Comprehensive are described, followed 
by a more detailed explanation of the structure of each section of the tests. In particular, this section describes how the 
constructs, the TLU subdomains, and the tasks are operationalized in TOEFL Junior. 


Overall Structure of the Test 

The overall structures of TOEFL Junior Standard and TOEFL Junior Comprehensive are presented in Tables 1 and 2, 
respectively, with information on the sections included and the number of items/tasks and the allotted time in different 
sections. 

As briefly discussed in the introduction to this report, TOEFL Junior Standard consists of all selected-response ques¬ 
tions and is delivered in paper-and-pencil format. On the other hand, TOEFL Junior Comprehensive is administered on 
a computer and consists of both selected-response and constructed-response questions. The receptive skills (i.e., listening 
and reading) are measured through selected-response questions and the productive skills (i.e., speaking and writing) are 
measured through constructed-response questions. 

In each section of TOEFL Junior, with the exception of the language form and meaning section in TOEFL Junior 
Standard, items are selected to tap into the target construct in each of the three TLU subdomains: social and interpersonal, 
navigational, and academic. Details on the section structures in relation to the TLU subdomains and specific language 
skills are described in the following section. 
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Table 3 Structure of Language Form and Meaning Section 


No. of operational items 

Language meaning 

8-14 

Language form 

16-22 

Total 

30 


Section Structures: Language Form and Meaning, Listening, Reading, Speaking, and Writing 

This section is divided into five subsections, each of which focuses on one of the five sections appearing in TOEFL Junior: 
language form and meaning, listening, reading, speaking, and writing. The first three sections are included in TOEFL 
Junior Standard, and the latter four sections (starting from listening) are in TOEFL Junior Comprehensive. Because the 
listening and reading sections appear in both of the TOEFL Junior tests, they are discussed only once. However, it should 
be noted that there are slight differences in operationalizing the two sections in the two tests, such as the number of items. 
These differences are summarized in tables wherever applicable. 

Language Form and Meaning 

In this section, test takers are given reading passages in which words have been purposefully deleted so that students 
must fill in the blanks by choosing an answer among four options to complete the text appropriately. The passages can 
be one of the following types that middle school students are likely to encounter in their school lives: announcement, 
correspondence, advertisement, biographical, expository, or fiction. 

Each reading passage contains four to eight items, depending on the type and length. Longer passages — usually expos¬ 
itory, biographical, and fictional narrative texts — are eight to nine sentences in length and support six to eight items. 
Shorter passages (e.g., announcement, correspondence, and advertisement) are four to five sentences in length and sup¬ 
port four questions. All items in this section measure knowledge of language meaning or form. The number of items 
targeting each of these constructs is summarized in Table 3. 

The language meaning items ask students to choose, from a set of four options, the one correct word that semanti¬ 
cally completes a sentence within the context of a passage. The language form items test a student’s ability to recognize 
the proper structures needed to complete a grammatically accurate sentence in English. Both types of items, collectively, 
encompass a wide range of English vocabulary and grammar by including items targeting a variety of language categories. 
The vocabulary items encompass different parts of speech (noun, verb, adjective, adverb, determiner, conjunction, and 
preposition), and the grammar items include questions about sentence structure (e.g., correct subject and object forms, 
subject-verb agreement), verb form (e.g., tense and aspect), passive/active voice, relative clauses, word order, and com¬ 
parative/superlative forms. Including a variety of language aspects was considered important in test design to ensure that 
students have a broad understanding of the English language. The language features measured in the section were chosen 
from among those that are taught in English curricula, and their difficulty was gauged by expert judgment. 

Listening 

In this section, test takers listen to aural stimuli and answer four-option multiple-choice questions presented after each 
stimulus. The number of questions per stimulus varies depending on the type of stimulus — three or four questions for 
short conversation stimuli, one question for classroom instruction stimuli, and four questions for academic listening 
stimuli. Note that the number of items of each type varies in TOEFL Junior Standard and TOEFL Junior Comprehensive, 
as presented in Table 4. 

Table 4 summarizes the relationships among stimulus type, TLU domain, and the subskills to be measured by a stim¬ 
ulus. As shown in the first two columns of the table, there is a one-to-one correspondence between stimulus type and 
the TLU subdomain that each stimulus type is targeting. The short conversations, classroom instruction, and academic 
listening stimuli are intended to measure test takers’ ability to communicate for social and interpersonal, navigational, 
and academic subdomains, respectively. 
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Table 4 Listening Section Structure 



Target language 
use subdomain 



No. of operational items 

Stimulus/input 

Subskills measured 

Standard 

Comprehensive 

Short conversations 

Social and 

• 

Comprehending the main idea 

11-12 

12 


interpersonal 

• 

Identifying salient details 





• 

Making inferences 





• 

Making predictions 





• 

Identifying speaker’s purpose 





• 

Understanding a meaning con¬ 
veyed by prosodic features 



Classroom instruction 

Navigational 

• 

Comprehending the main idea 

6-7 

8 

(monolog) 


• 

Identifying salient details 





• 

Making inferences 





• 

Making predictions 



Academic listening 

Academic 

• 

Comprehending the main idea 

12 

8 

(monolog/discussion) 


• 

Identifying salient details 





• 

Making inferences 





• 

Making predictions 





• 

Identifying speaker’s purpose 



Total 




30 

28 


In addition, each listening item aims to measure either (a) one of the common subskills or (b) one of the domain-specific 
subskills. The common subskills refer to listening abilities that can be measured in any of the three TLU subdomains. For 
example, a question about main idea can be based on a listening stimulus in any of the TLU subdomains. The domain- 
specific subskills are operationalized to be measured in one or two of the TLU subdomains only. Specifically, the ability 
to identify speaker’s purpose is operationalized for both the social and interpersonal and academic domains, whereas the 
ability to understand a meaning conveyed by prosodic features is operationalized exclusively for the social and interper¬ 
sonal domain. 

Reading 

In the reading section, as in the listening section, test takers are presented with reading materials and then with four-option 
multiple-choice questions. As summarized in Table 5, each stimulus type taps into one of the three TLU subdomains. In 
addition, each reading comprehension item is designed to measure one of the seven common subskills, which are listed in 
the third column of the table. Finally, as shown in the last column of Table 5, some stimulus types may not be included in 
a given operational test form. The single exception is the expository stimulus type: Every operational form includes two 
eight-item sets, each with an expository stimulus. 

Speaking 

The speaking section consists of four tasks, as summarized in Table 6. In each task, the total time, shown in the last column 
of the table, represents time provided for test takers to (a) process the stimulus input, either linguistic, nonlinguistic, or 
both; (b) prepare for their responses (i.e., preparation time); and (c) record their responses (i.e., speaking time). 

As shown in Table 6, each speaking task is designed to measure the test takers’ ability to communicate in one of the three 
TLU subdomains. It should be noted that all of the tasks except the picture narration task require test takers to understand 
language input, either written or spoken, to successfully complete the task, as shown in the integrated skills column in 
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Table 5 Reading Section Structure 


No. of operational items 

Target language - 

Stimulus/input use subdomain Subskills measured Standard Comprehensive 


Correspondence 

Social and interpersonal 

• 

Comprehending the main idea 

0 or 4 

0 or 4 

Nonlinear text 

Navigational 

• 

Identifying important supporting factual information 

0 or 4 

0 or 4 

Journalism 

Navigational 

• 

Making inferences 

0 or 8 

0 or 8 

Expository 

Academic 

• 

Discerning the meaning of low-frequency words or 
expressions from context 

16 

16 



• 

Recognizing an authors purpose or use of particular 
rhetorical structures 





• 

Understanding figurative and idiomatic language from 






context 



Total 




30 

28 


Table 6 Speaking Section Structure 

Task 

Target language use subdomain 

Integrated skills 

Preparation time 

Speaking time 

Total time 

Read Aloud 

Academic 8 

Reading, speaking 

1:00 

1:00 

3:30 

Picture Narration 

Social and interpersonal 

n/a 

1:00 

1:00 

3:20 

Nonacademic Listen-Speak 

Navigational 

Listening, speaking 

0:45 

1:00 

4:30 

Academic Listen-Speak 

Academic 

Listening, speaking 

0:45 

1:00 

4:30 


“Opinions may differ with respect to the appropriateness of categorizing the read aloud task as academic, because this task does not tap 
directly into communicative skills but rather targets enabling skills (e.g., accuracy of pronunciation and intonation and fluency) that 
form the basis for all speaking tasks. While acknowledging this perspective, the read aloud task is categorized as academic in the test 
design framework because the classroom is the most common context in which students are asked to read text aloud. In other words, 
this task is one of the important tasks that students are commonly expected to perform in an academic context. 


Table 7 Writing Section Structure 


Task 

Target language use subdomain 

Integrated skills 

Writing time 

Total time 

Editing 

Navigational/Academic 

Reading, writing 

5:00 

5:30 

E-mail 

Social and interpersonal 

Reading, writing 

7:00 

7:30 

Opinion 

Social and interpersonal/Academic 

n/a 

10:00 

10:30 

Listen-Write 

Academic 

Listening, writing 

10:00 

14:30 


Table 6. This was a conscious decision intended to ensure that three of the four tasks measure integrated language skills 
for communication, better reflecting language use in the real world. 

Writing 

The writing section consists of four tasks. The tasks and the time allowed for each task are summarized in Table 7. In this 
section, the total time includes both time for test takers to process the stimulus input and time to produce their written 
responses. Unlike in the speaking section, time for test takers to prepare for their responses is not separately assigned in 
the writing section; instead, test takers use their response time for planning their writing (e.g., outlining), composing their 
responses, and finally, proofreading what they have written. 
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Table 8 Scores on the TOEFL Junior Standard Score Report 



Reported score range 
(increments of 5 for 

CEFR level and 


Section/overall 

the section scale scores) 

can-do statements 

Additional information 

Listening 

200-300 

Below A2, A2, Bl, B2 

Lexile score 510L-1150L 

Language form and meaning 

200-300 

Below A2, A2, Bl, B2 

n/a 

Reading 

200-300 

Below A2, A2, Bl, B2 

n/a 

Overall score level 

1-5 

n/a 

Overall performance descriptor and 
CEFR profile for the three sections 

Note. CEFR = Common European Framework of Reference. 



Table 9 Scores on the TOEFL Junior Comprehensive Score Report 



Reported score range 

CEFR level and 


Section/overall 

(increments of 1) 

can-do statements 

Additional information 

Reading 

140-160 

Below A2, A2, Bl, B2 

Lexile score 510L-1150L 

Listening 

140-160 

Below A2, A2, Bl, B2 

n/a 

Speaking 

0-16 

Below A2, A2, Bl, B2 

n/a 

Writing 

0-16 

Below A2, A2, Bl, B2 

n/a 

Overall score level 

1-6 

n/a 

Overall performance descriptor and 
CEFR profile for the four skills 


Note. CEFR = Common European Framework of Reference. 


As in the speaking section, each writing task is designed to measure the test takers’ ability to communicate in one or 
two of the three TLU subdomains, and three of the four writing tasks require the integration of language skills for their 
successful completion. 


Scoring System 

This section describes how each of the scores included on the score report were developed to provide reliable, meaningful, 
and accessible information about test takers’ performance. In developing scores, the following considerations were taken 
into account: current practices in establishing score scales, results from the pilot study, and potential uses of the reported 
scores. 

A score report for both the TOEFL Junior Standard test and the TOEFL Junior Comprehensive test contains the follow¬ 
ing information: overall score level, section scores for each of the sections, a Common European Framework of Reference 
(CEFR; Council of Europe, 2009) level for each test section, can-do statements that describe what students can typically 
do at the scored CEFR level, and a Lexile score on the reading section. The can-do statements included in the score reports 
are adapted from the CEFR can-do statements (Council of Europe, 2009) and modified to make them more appropriate 
for the language use required for the target age group of the test. See Appendix E for a sample TOEFL Junior Compre¬ 
hensive score report. In addition, Tables 8 and 9 summarize the scores that are provided on the score reports for TOEFL 
Junior Standard and TOEFL Junior Comprehensive, respectively. 

As summarized in Tables 8 and 9, the CEFR levels reported for each test section represent four levels: below A2 (the 
lowest performance level measured by the test), A2, Bl, and B2 (the highest performance level measured by the test). 
These levels were established through standard-setting studies that ETS conducted separately for the two TOEFL Junior 
tests. 1 Finally, for the reading section, another auxiliary score, the Lexile measure, is reported. The Lexile score is provided 
so that a student can easily identify reading materials at an optimal level of difficulty to improve his or her reading skills. 
Information about the relationship between performance on the TOEFL Junior Reading section and the Lexile measure 
can be found in MetaMetrics (2012). 

In the next three sections, more detailed explanations are provided for the following three test development procedures: 
(a) section scores, (b) overall score levels and performance descriptors, and (c) scoring rubrics for the speaking and writing 
tasks. It should be noted that the last subsection, which is about the scoring rubrics, is relevant only to TOEFL Junior 
Comprehensive. 
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Table 10 Number of Items, Raw Scores, and Scale Scores for the Two TOEFL Junior Tests 


Section 

No. of items 


Raw score 


Scale score 

Range 

Increments 

Range 

Increments 

TOEFL Junior Standard 





Listening 

30 

0-30 

1 

200-300 

5 

LFM a 

30 

0-30 

1 

200-300 

5 

Reading 

30 

0-30 

1 

200-300 

5 

TOEFL Junior Comprehensive 





Reading 

28 

0-28 

1 

140-160 

1 

Listening 

28 

0-28 

1 

140-160 

1 

Speaking 

4 

0-16 

1 

n/a 

n/a 

Writing 

4 b 

0-16 

0.5 

n/a 

n/a 


“Language form and meaning. b One of the four writing tasks — editing—has two individual items. The average of the scores from the 
two items is the score for the editing task. This procedure results in 0.5 increments in the raw writing scores. 


Section Scores 
Section Raw Scores 

Table 10 summarizes information about the number of items in each section and the range and increments of raw 
and scaled scores. For the sections composed of selected-response items — language form and meaning, listening, and 
reading—test takers earn one score point for each item answered correctly, while no points are earned for incorrect 
responses or no response at all. As indicated in the table, the raw scores (i.e., the number of items answered correctly) 
are converted to scaled scores (discussed in the next section of this report), and only scaled scores are included in the 
score report. For the speaking and writing sections that consist of four constructed-response tasks each, each response 
is scored by a human rater on a holistic rubric scale of 0 to 4 (discussed in the section titled Scoring Rubrics of the 
Speaking and Writing Tasks). In particular, with reference to the descriptions in the scoring rubrics, the meanings of raw 
scores in the speaking and writing sections can be more easily interpreted than can the meanings of raw scores on the 
selected-response items. Therefore, it was deemed unnecessary to convert speaking and writing scores into scaled scores, 
and raw scores are reported for these two sections. 


Considerations for Scaled Score Development 

It is a common assessment practice that scaled scores, instead of raw scores, are reported in order to ensure that scores 
are comparable across test forms that may not have the same difficulty level (Kolen, 2006). As a best practice, scaled 
scores are created from raw scores with appropriate statistical adjustments for form difficulty; this enables scaled scores to 
hold their meaning over time and across different test forms. A variety of guidelines have been discussed in educational 
measurement literature about best practices for creating appropriate and meaningful scaled scores (Dorans, 2002; Kolen, 
2006). The following essential guidelines were considered in creating scaled scores for TOEFL Junior: 

1. Use distinctive scales that do not overlap with other scales, either between the two TOEFL Junior tests or with any 
other ETS tests, to avoid confusion and misuses. 

2. Make every item or raw score point in the meaningful raw score range count toward a scaled score point, if possible, 
to avoid loss of information that results from converting multiple raw score points to a single score point on the 
scale. 

3. Ensure that for every scaled score point, there is at least one item or one raw score point to avoid the unjustified 
differentiation of test takers. 

It is worth emphasizing that the first point was considered particularly important in the score scale development for the 
two TOEFL Junior tests. As discussed in the previous sections of this report, the two versions were developed to provide 
stakeholders with options to choose from as suited to their needs and purposes. However, we did not want the test scores 
from one version to be misinterpreted or misused in contexts where the use of the other version seemed more appropriate. 
This consideration provided the main rationale for developing different score scales for the two TOEFL Junior tests. 
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In light of these considerations, scales for the selected-responses sections in the TOEFL Junior Standard and Com¬ 
prehensive tests were developed. One difference in the scales is that the resulting scaled scores range from 200 to 300 
in increments of 5 in the TOEFL Junior Standard test, whereas they range from 140 to 160 in increments of 1 in the 
TOEFL Junior Comprehensive test (see Table 10). Scores on any new test form will be equated and then reported on their 
respective scales. 

Determining the Speaking and Writing Scales 

The speaking and writing sections each have four constructed-response items. Being few in number, these items are sus¬ 
ceptible to memorization. This means that pretesting the constructed-response items would pose a test security risk. 
Consequently, conventional score equating that requires pretesting of items is not feasible for constructed-response items. 
In many testing programs that use constructed-response items only, conventional score equating is not performed. Instead 
of conventional score equating, quality control is maintained by trying out new items in small-scale sessions before they 
are used in the test, 2 as well as through rigorous training of human raters and monitoring of their performance. These 
quality control methods are used to ensure quality and stability in the meaning of scores for the speaking and writing 
sections of the TOEFL Junior Comprehensive test. 

Because the speaking and writing section scores will not be equated, the scores are not strictly comparable, psycho- 
metrically speaking, across test forms, despite the aforementioned quality control measures that have been put in place. 
To avoid any incorrect impression on the part of stakeholders that the speaking and writing scores are comparable across 
forms, as the reading and listening section scores are, it was decided that the speaking and writing scales would be made 
clearly distinguishable from the reading and listening scales. In addition, to maximize the interpretability of the speaking 
and writing scales, speaking and writing scores are reported so as to be clearly associated with the performance levels that 
the scoring rubrics describe. 

Both the speaking and writing scaled scores range from 0 to 16 in increments of 1. The four previously mentioned 
guidelines were followed in setting the scales. For speaking, each scaled score is associated with one and only one raw 
score. For writing, half points are rounded to the next higher whole number when calculating scaled scores (e.g., raw 
score 3 is set to scaled score 3; raw score 3.5 is set to scaled score 4). Because each speaking and writing response is scored 
on a 0-4 rubric scale (see Appendices F and G for scoring rubrics for the speaking and writing tasks) and the section 
score is the sum of the four item scores, dividing a scaled score by 4 yields a value that is compatible with the average item 
score; the corresponding scoring rubrics for this average item score may assist in understanding the typical characteristics 
of performance at this average item score level. 

Overall Score Levels and Performance Descriptors 

Based on the section scores explained earlier, total scaled scores were calculated, by either summing the section scores 
(TOEFL Junior Standard) or developing a different total score scale (TOEFL Junior Comprehensive). However, there is a 
limit to the amount of information that a numeric, total scaled score can provide about a test taker’s language performance 
across different sections of a test. This fact becomes particularly clear in light of the fact that many possible combinations 
of section scores could arrive at the same total scaled score. To overcome this limitation of total scaled scores, it was 
decided that overall score levels would be reported instead. The overall score levels are band scores, as discussed in the next 
subsection. They are intended to help test users better understand the test results and better interpret their meanings. The 
following two steps were followed in developing the overall score levels and level descriptors: (a) developing band levels 
and (b) developing performance descriptors. More details about the procedures can be found in Papageorgiou, Morgan, 
and Becker (2014) for the TOEFL Junior Standard test and in Papageorgiou, Xi, Morgan, and So (in press) for the TOEFL 
Junior Comprehensive test. 

Developing Overall Score Levels 

The main goal of this step was to determine the number of overall score levels and to set cut scores to classify test takers into 
levels both meaningfully and reliably. In the process, the following criteria were applied for TOEFL Junior Standard and 
TOEFL Junior Comprehensive, respectively. We note that the types of data considered were different, primarily because 


TOEFL Junior Research Report No. 02 and ETS Research Report No. RR-15-13. © 2015 Educational Testing Service 


17 


Y. So ef al. 


TOEFL Junior® Design Framework 


Table 11 Overall Score Levels, Performance Descriptors, and Common European Framework of Reference (CEFR) Profiles for TOEFL 
Junior Standard 


Overall 
score level 

Label 

Overall performance descriptor 

CEFR profile 



These descriptions represent performance in middle 
schools which use English for instruction. A typical 
student at this level: 

A typical student at this level 

achieved these section-level CEFR 

scores: 

5 

Superior 

Consistently demonstrates comprehension of complex 
written and spoken materials, drawing on knowledge 
of complex language structures and vocabulary. 

B2 for all sections 

4 

Accomplished 

Often demonstrates comprehension of complex written 
and spoken materials, drawing on knowledge of 
complex language structures and vocabulary. 

B1 for all sections 

3 

Expanding 

Demonstrates comprehension of some complex written 
and spoken materials and most basic materials, 
drawing on knowledge of basic language structures 
and vocabulary. 

Mostly B1 for all sections, but 
occasionally A2 

2 

Progressing 

Occasionally demonstrates comprehension of basic 
written and spoken materials, drawing on knowledge 
of basic language structures and vocabulary. 

Mostly A2 for all sections, but 
occasionally Al for reading and 
listening 

1 

Emerging 

Can comprehend some very basic written and spoken 
texts, drawing on knowledge of basic language 
structures and vocabulary, but needs to further develop 
these language skills and comprehension abilities. 

Mostly Al for listening and reading; 
mostly A2 for language form and 
meaning 


of the difference in structure between the two tests. However, the general procedures for the development of band levels 
were the same across the two tests. 

In the development of overall score levels for the TOEFL Junior Standard test, which happened after the development 
of these levels for the TOEFL Junior Comprehensive test, it was decided that the number of overall score levels for the two 
TOEFL Junior tests should differ so as to prevent any misuse of the results, such as making direct comparisons between 
the score levels of the two tests (see the section titled Considerations for Scaled Score Development). The scores of 4,977 
students who took one of the two operational test forms of TOEFL Junior Standard in 2012 were used to develop the 
overall score levels. 

For TOEFL Junior Comprehensive, the following data, collected from the 2,931 students who participated in the 2011 
TOEFL Junior Comprehensive pilot administrations, were taken into consideration: (a) the means and standard deviations 
of the total scaled scores for each raw score point on the speaking and writing sections; (b) the means and standard 
deviations of the listening and reading section scores for each raw score point on the speaking and writing sections; and 
(c) the CEFR profiles of the four sections for each total scaled score — this information was also collected from a separate 
standard-setting study that set TOEFL Junior Comprehensive cut scores for the CEFR levels. 

For each of the tests, three proposals were developed to set the number of overall score levels and cut scores, and then 
the reliability of each proposal was estimated using RELCLASS (Livingston & Lewis, 1995). In addition, the CEFR profiles 
of the band levels for each solution were examined to provide an initial understanding of how proficiency progresses from 
lower to higher bands. A five-score-level solution (Table 11) and a six-score-level solution (Table 12) were finally selected 
for TOEFL Junior Standard and TOEFL Junior Comprehensive, respectively. 

Developing Overall Score-Level Performance Descriptors 

After making final decisions about the overall score levels for each of the TOEFL Junior tests, assessment specialists 
and researchers collaborated to develop performance descriptors that capture a typical student’s language proficiency 
within each overall score level. Following is the information that was taken into account in developing the performance 
descriptors: (a) the means and standard deviations of each of the test sections by overall score level; (b) the characteristics 
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Table 12 Overall Score Levels, Performance Descriptors, and Common European Framework of Reference (CEFR) Profiles for TOEFL 
Junior Comprehensive 


Overall 
score level 

Label 

Overall performance descriptor 

CEFR profile 



These descriptions represent performance in middle 
schools, which use English for instruction. A 
typical student at this level: 

A typical student at this level 

achieved these section-level CEFR 

scores: 

6 

Excellent 

Consistently demonstrates the skills needed to 
communicate successfully at a high level in 
complex interactions and while using complex 
materials. 

B2 for all sections 

5 

Advanced 

Often demonstrates the skills needed to communicate 
successfully at a high level in complex interactions 
and while using complex materials. 

B1 or B2 for reading and listening; 

B1 for speaking and writing 

4 

Competent 

Demonstrates the skills needed to communicate 
successfully in some complex situations and in 
most simple interactions and while using basic 
materials. 

B1 for reading and listening; B1 or 

A2 for speaking and writing 

3 

Achieving 

Usually demonstrates the skills needed to 

communicate successfully in simple interactions 
and while using basic materials. 

A2 or B1 for listening; A2 for 
reading, speaking, and writing 

2 

Developing 

Occasionally demonstrates the skills needed to 
communicate successfully in simple interactions 
and while using basic materials. 

A2 for reading and listening; below 
A2 for speaking and writing 

1 

Beginning 

Demonstrates some basic language skills but needs to 
further develop those skills in order to 
communicate successfully. 

Below A2 for all sections 


of reading and listening items answered correctly by students at different levels; (c) the test performance of US middle 
school students (both English learners and native English speakers), reported in Wolf and Steinberg (2011); (d) descrip¬ 
tors of the proficiency scales of the CEFR to which the test scores are mapped; (e) typical profiles of students across the test 
sections; and (f) the rubrics used to score the writing and speaking tasks (TOEFL Junior Comprehensive only). Tables 11 
and 12 summarize the results of the procedures used to define meaningful and reliable overall score levels with reference 
to the total scaled scores and to develop performance descriptors for each of the overall score levels for TOEFL Junior 
Standard and TOEFL Junior Comprehensive. 

The Relationship of Overall Score Levels Between the Two TOEFL Junior Tests 

Despite the potential usefulness, relative to numeric scores, of reporting overall score levels and accompanying perfor¬ 
mance descriptors, there exists a potential for misuse of the score levels. One of these potential misuses would be to claim 
that results from the two TOEFL Junior tests are equivalent. To prevent this unjustified use, different numbers of overall 
score levels (five for TOEFL Junior Standard and six for TOEFL Junior Comprehensive) were developed for the two TOEFL 
Junior tests, as discussed earlier. In addition, empirical evidence was collected to illustrate why the aforementioned misuse 
is not warranted. Table 13 shows the relationship of the overall score levels between the tests. The results in the table were 
produced as part of the study that developed the overall score levels for TOEFL Junior Standard (Papageorgiouet al., 2014). 

What needs to be emphasized, as shown in the table, is that there is not a one-to-one correspondence in the overall 
score levels between the two tests. Instead, there is a probabilistic relationship between the overall score levels of the two 
tests. For example, for students who received the highest overall score level (Level 5) on the TOEFL Junior Standard, 
half of them are projected to receive Level 6 (the highest level on TOEFL Junior Comprehensive), while the remaining 
students are projected to obtain either Level 5 or 4. Furthermore, as explained in previous sections, the two TOEFL 
Junior tests measure different constructs and are composed of different sections with different structures. 
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Table 13 Percentage in Each TOEFL Junior Comprehensive Overall Score Level Conditional on TOEFL Junior Standard Overall Score 
Level 


TOEFL Junior Standard 
overall score level 


TOEFL Junior Comprehensive overall score level 


6 

5 

4 

3 

2 

1 

5 

50% 

33% 

15% 

0% 

1% 

0% 

4 

4% 

33% 

54% 

7% 

1% 

1% 

3 

1% 

4% 

49% 

36% 

9% 

1% 

2 

0% 

0% 

6% 

39% 

41% 

14% 

1 

0% 

0% 

1% 

6% 

28% 

65% 


Note: Adapted from Development of Overall Score Levels and Performance Descriptors for the TOEFL Junior Standard Test, by S. Papa- 
georgiou, R. Morgan, and V. Becker, 2014. 


For these two reasons, overall score levels should not be compared directly between the two tests. Rather, stakeholders 
should choose the test that best fits their needs and interests. For example, if the primary need of a score user is to track the 
developmental progress of students in a language learning program that values the balanced development of all of the four 
language skills, TOEFL Junior Comprehensive would be expected to provide more useful information for this specific use. 

Scoring Rubrics of the Speaking and Writing Tasks (TOEFL Junior Comprehensive Only) 

The speaking and writing scoring rubrics were developed in a multistage process. A small-scale prototype study was 
conducted with English learners in the United States in 2010 to trial prototype items and gather indicators of different 
levels of performance on the items. Experts experienced in evaluating the speaking and writing abilities of nonnative 
English speakers (e.g., TOEFL iBT® test certified raters) analyzed responses to the prototype items, and the results were 
used to formulate descriptors for the initial sets of scoring rubrics. Pilot study results were then used to further refine 
the scoring rubrics for each of the tasks and to establish benchmark and calibration samples for rater training. It should 
also be noted that speaking and writing ability, respectively, were considered the constructs to be measured and scored 
in the integrated speaking and writing items (see Tables 6 and 7). In other words, to avoid cases in which listening or 
reading stimulus comprehension difficulty compromises test takers’ ability to complete the integrated tasks, the reading 
and listening stimuli of the integrated items were written so as to be lower in comprehension difficulty than the texts used 
as stimuli in the listening and reading sections. 

Developing Scoring Rubrics for the Speaking Tasks 

The scoring rubrics with which a test taker’s spoken responses are to be evaluated were developed in three stages. First, test 
takers’ responses representing a wide range of speaking proficiency levels were sampled from responses collected during 
the prototyping stage. Second, raters with extensive experience in scoring TOEFL iBT and/or the TOEIC® tests were 
recruited to participate in the rubric development study. Third, raters were trained to rank order the sampled responses 
according to three dimensions: oral production, syntax and vocabulary, and content. In addition, the raters rank ordered 
the responses on overall fluency, a more holistic evaluation of speaking performance. Specific features for each dimension, 
and for overall fluency, include the following: 

Oral Production 

• Pronunciation is clear. 

• Intonation and stress effectively convey meaning. 

• Pacing is appropriate. 

• Occasional errors do not interfere with communication. 

Syntax and Vocabulary 

• Sentence and phrase types vary effectively. 

• Word form and word choice are correct. 
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• Word choice is appropriate to context (e.g., representative of academic context). 

• Occasional errors do not interfere with communication. 

Content 

• Content is full and relevant. 

• Content is mostly accurate. 

• Content/ideafs) is clearly connected. 

Overall Fluency 

• Expression is fluid. 

• Intelligibility is high. 

• Ideas progress clearly (coherence). 

Finally, in addition to rank ordering samples, the raters were asked to provide written descriptions of each test taker’s 
performance to justify its ranking. During this process, a scoring rubric of 0-4 was developed. 

The data from this rubric development study were then finalized based on the pilot administration with a larger sample 
of test takers from different countries. The final versions of the scoring rubrics are provided in Appendix F. 

Developing Scoring Rubrics for the Writing Tasks 

The process of creating the writing rubrics was similar to the process used for the speaking rubrics. First, test takers’ 
responses representing a wide range of writing proficiency levels were sampled from responses collected during the pro¬ 
totyping stage. Responses were selected only from the four items whose specifications were similar to those of the items 
that were selected to be piloted. Second, raters with extensive experience in scoring TOEFL iBT and/or TOEIC writing 
items were recruited to participate in the rubric development study. 

The raters were trained to rank order the sample responses according to four dimensions: content, syntax, vocabulary, 
and mechanics/conventions. In addition, raters were trained to rank order the responses in terms of overall writing quality. 
Finally, in each category, raters were asked to list the features of each response that they considered to be most salient, the 
goal being to provide a rationale for the rankings assigned as well as to support the creation of detailed feature descriptors 
for the rating scale. Based on these results, scoring rubrics on a 0 - 4 scale were developed for each of the four writing tasks 
and later refined based on the additional response samples collected during the pilot administrations around the world. 
The resulting rubrics are presented in Appendix G. 

Interpretive Argument and Supporting Research 

To support the adequacy and appropriateness of TOEFL Junior scores for the intended test uses outlined earlier, collecting 
diverse sources of validity evidence is essential. The framework for gathering evidence to validate TOEFL Junior test 
score interpretation and use is based on the interpretive argument structure approach (Bachman & Palmer, 2010; Kane, 
Cooks, & Cohen, 1999; Mislevy, Steinberg, & Almond, 2003; Toulmin, 2003). Chapelle, Enright, and Jamieson (2008) 
provided a comprehensive account of how the interpretative argument approach was utilized as a validation framework for 
TOEFL iBT test score interpretation and use. In this framework, various types of inferences are made based on warrants 
or statements that connect test scores to their meanings and uses. To back up the warrants supporting each inference, 
evidence needs to be collected. Table 14 illustrates inferences, warrants, and types of research needed to yield supportive 
evidence for validating TOEFL Junior uses. The test design team has referred to the framework to collect validity evidence 
at different test development stages, and this effort will continue to provide research support to ensure that the TOEFL 
Junior scores are interpreted and used validly. The penultimate column of the table indicates whether each area of research 
was addressed at the time of test development, has been conducted subsequent to the introduction of the test, or has yet 
to be completed. In addition, a reference is provided in the last column if the documentation is publicly available. 
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Notes 

1 Details about the relationship between TOEFL Junior scores and the CEFR levels in each of the TOEFL Junior tests can be found 
on the TOEFL Junior website at https://www.ets.org/toefl_junior/scores_research/standard/cefr (for TOEFL Junior Standard) 
and at http://www.ets.org/toefl_junior/scores_research/comprehensive/cefr/ (for TOEFL Junior Comprehensive). 

2 This trialing process is different from pretesting because trial items are administered to students who are believed to represent the 
target test-taker population. Conversely, pretest items are administered to actual test takers at the time when they are taking an 
operational test. 
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Appendix A 

Summary of the Curricula and Standards Reviews: Listening 

The curricula and standards reviews indicate that the language use in the three TLU subdomains, that is, social and 
interpersonal, navigational, and academic, differs more by the genres of spoken discourse than by the listening subskills. In 
other words, the listening subskills required overlap across the three TLU subdomains, which can be seen in the following 
table. 

Table Al Common Listening Subskills in Multiple Subdomains 


Subskill 

Examples from ELP standards 

TLU subdomain 

Understanding the main idea 

“Identify and explain the main ideas and some 

Social and interpersonal, navigational, 

and supporting details 

details of texts” (CA) 

and academic subdomains 

Identifying important details 

“Listen and gain information for a variety of 
purposes, such as summarizing main ideas 
and supporting details” (FL) 

Social and interpersonal, navigational, 
and academic subdomains 

Making inferences or predictions 

“Understand implicit ideas and information in 
increasingly complex spoken language 
commensurate with grade-level learning 
expectations” (TX) 

Social and interpersonal, navigational, 
and academic subdomains 

Interpreting prosodic features 
such as intonation and 

contrastive stress 

“Distinguish sounds and intonation patterns of 
English with increasing ease” (TX) 

Social and interpersonal subdomain 

Understanding a speaker’s 

“Identify speaker attitude and point of view” 

Social and interpersonal and academic 

purpose 

(MI) 

subdomains 


Note. ELP = English language proficiency; TLU = target language use; CA = California; FL = Florida; MI = Michigan; TX = Texas. 
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Unlike the listening subskills, which are commonly applied to all subdomains, the types of spoken discourse are found 
to differ across subdomains. The genre, topic/content, and linguistic characteristics of spoken discourse required in each 
subdomain are summarized in the following table. 

Table A2 Types of Spoken Discourse in Each Subdomain 

Genre Topic/content Characteristics of input/stimuli 

Social and interpersonal subdomain 
Conversations Personal 


• Feelings 

• Opinions 

• Experiences 

• Events 


Navigational subdomain 

Directions Class-related 

Announcements _. ,, . 

• Field trip 

• Homework 

• School announcement 


Academic subdomain 

Lectures Academic content-related 

Academic discussions „ . 

• Science 

• Social studies 

• Literature 

• Math 


Form: a dialog/multiparty conversation 
Length: a number of turn-taking sentences 
Language characteristics: 

• Lexical: basic, familiar vocabulary 

• Grammatical: simple to complex sentences 

• Discourse: a coherent dialog 

• Pragmatic: expressing feeling/opinions; narrating; deliv¬ 
ering information; describing 

Form: a monolog 

Length: a sentence to several sentences 
Language characteristics: 

• Lexical: basic, familiar, academic vocabulary 

• Grammatical: simple to complex sentences 

• Discourse: a coherent monolog 

• Pragmatic: delivering information; describing; instruct¬ 
ing; reminding; announcing; requesting 

Form: a monolog/multiparty discussion 

Length: sustained discourse about an academic topic 

Language characteristics: 

• Lexical: academic vocabulary 

• Grammatical: simple to complex sentences 

• Discourse: a coherent discourse about a given topic 

• Pragmatic: summarizing, describing, analyzing, and 
evaluating 


Appendix B 

Summary of the Curricula and Standards Reviews: Reading 

As in listening, common reading subskills are found to be required in all of the three TLU subdomains. The following 
table summarizes the reading subskills. Note that all of the reading subskills summarized in the table apply to all three 
subdomains: social and interpersonal, navigational, and academic. 

Table B1 Reading Subskills Common to all Subdomains 


Subskill Examples from ELP standards 

Understanding the main idea “Identify and explain the main ideas and some details of texts” (CA) 

“Identify important details, essential message, and main idea of a text” (FL) 

Identifying important details “Listen and gain information for a variety of purposes, such as summarizing main 

ideas and supporting details” (FL) 

Making inferences or predictions “Make predictions, inferences, and deductions, and describe different levels of 

meaning of literary works presented orally and in written form, including literal and 
implied meanings” (NY) 
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Table B1 Continued 

Subskill 

Examples from ELP standards 

Inferring the meaning of a word from 

context/understanding figurative and idiomatic 
language from context 

Recognizing an author’s purpose 

“Employ phonemic awareness, inference, contextual clues, synonyms and antonyms 
relationships to analyze words and text” (FL) 

“Apply knowledge of word relationships, such as roots and affixes, to derive meaning 
from literature and texts in content areas” (CA) 

“Identify speaker attitude and point of view” (MI) 


Note. ELP = English language proficiency; CA = California; FL = Florida; MI = Michigan; NY = New York. 


The three TLU subdomains are found to differ by the genres of reading materials that students are required to 
understand for different purposes. The topics and the linguistic characteristics are also found to change with different 
genres. 


Table B2 Types of Written Genre in Each Subdomain 


Genre 

Topic/content 

Characteristics of input/stimuli 

Social and interpersonal subdomain 
Correspondence (e.g., e-mails 
and letters) 

Personal 

• Feelings 

• Opinions 

• Experiences 

• Events 

Form: a written letter, e-mail, social media site post, text 
message 

Length: varied: a few words to multiple paragraphs 

Language characteristics: 

• Lexical: mostly basic vocabulary; idiomatic expressions 

• Grammatical: simple to complex sentences 

• Discourse: a coherent text 

• Pragmatic: using appropriate register; delivering infor¬ 
mation; explaining; describing 

Navigational subdomain 

Nonlinear text (e.g., schedules 
and announcements) 

Brochures 

Journalism 

Class-related 

• Field trip 

• Homework 

• School announcement 

Form: chart, graph, poster, flyer including a written text, 
advertisement, brochure, graphic (schedule) 

Length: varied: phrases, a few sentences to multiple 
paragraphs 

Language characteristics: 



• Lexical: basic to some academic vocabulary 

• Grammatical: simple to complex sentences 

• Discourse: fragments to simple sentences 

• Pragmatic: delivering information 

Academic subdomain 

Text about an academic topic 

Academic content-related 

• Science 

• Social studies 

• Literature 

• Math 

Form: a written text 

Length: a paragraph to multiple paragraphs 

Language characteristics: 

• Lexical: basic to academic vocabulary 

• Grammatical: simple to complex sentences 

• Discourse: a coherent text 

• Pragmatic: describing; analyzing; comparing; contrast¬ 
ing; evaluating; commenting 
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Task ELP standards Characteristics of input Characteristics of expected response 

• Grammatical: simple to complex • Grammatical: simple to complex 

sentences sentences 

• Pragmatic: delivering information • Discourse: a coherent text 
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Note: (Korea), (France), (Japan), (China), (Chile) indicate sources of international curricula or textbooks, and the state names in the English language use (ELP) standards column (CA 
= California; CO = Colorado; FL = Florida; NY = New York; TX = Texas) indicate the state where the specific standards were found. 
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Appendix F 

Scoring Rubrics for Speaking Tasks 

TOEFL Junior Speaking Scoring Guide 
Task 1: Read Aloud 


Score 

Fluency and Accuracy Descriptors 

4 

A typical response at this level is characterized by the following: 

• Reading is mostly fluid and intelligible 

• Words are grouped in meaningful phrases with effective pauses. Punctuation is marked 
appropriately throughout 

• Intonation varies to match text provided 

• Speech is clear and distinct with only minor mispronunciations, substitutions, or omissions 

• Rate of speech is mostly appropriate 

3 

A typical response at this level is characterized by the following: 

• Reading is fairly fluid and intelligible 

• Words are generally grouped in meaningful phrases with only minor lapses. Punctuation is 
usually marked appropriately 

• Intonation may seem flat/monotone at times 

• Speech is clear and distinct most of the time; some mispronunciations, substitutions, or 
omissions may be noticeable but do not impact overall intelligibility 

• Rate of speech is mostly appropriate; occasional variation moy cause minor lapses in 
intelligibility 

2 

A typical response at this level is characterized by the following: 

Reading is noticeably choppy and unintelligible at times; sometimes is read word-by-word 

without meaningfully grouped phrases. Punctuation may not be marked at times. 

• Intonation may often be flat or monotone 

• Speech is dear and distinct at times but noticeable mispronunciations, substitutions, 
omissions, and self-corrections interrupt the flow and may impact overall intelligibility 

• Rate of speech is inappropriate at times (resulting in choppy pace or slurred words and 
mispronunciations) 

• Note: A response at this level may also be marked by numerous substitutions, omissions, 
and attempts to paraphrase, rather than read, sections of the text. 

1 

A typical response at this level is characterized by the following: 

• Reading is hard to follow and mostly unintelligible, with multiple starts and stops. Reading 
may be incomplete 

• Intonation is rarely used effectively 

• Frequent errors in pronunciation and stress; words may not be comprehensible 

• Self-corrections are ineffective most of the time. Substitutions may alter meaning 
substantially 

• Slow rate of speech 

• Punctuation rarely marked 

0 

No attempt to respond OR No English in the response OR Response is off topic OR Insufficient 
language to evaluate 
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TOEFL Junior Speaking Scoring Guide 
Task 2: Six-Picture Narration 
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Score 

Content, Delivery and Language Use Descriptors 

4 

A typical response at this level is characterized by the following: 

• Story is full, relevant to the pictures, and includes some detail and elaboration. Only minor 
lapses in content or coherence. Events unfold evenly and the sequence is easy to follow. 

• Overall fluidity of expression is evident; fairly smooth and confident rate of speech; little 
hesitancy. Errors of pronunciation, stress, and intonation may occur but rarely obscure 
meaning. 

• Grammar and word choice are varied, appropriate to the task, and effectively used to convey 
meaning clearly. Errors rarely obscure meaning. Use of connecting devices helps to link events 
for the listener. 

3 

A typical response at this level is characterized by the following: 

• Story is mostly complete but may include some noticeable lapses in content or coherence. 
Description of some key events may lack detail or elaboration. Some details may be confusing 
to the listener. 

• Mostly fluid expression but some hesitancy and choppiness may be noticeable. Errors in 
pronunciation, stress, and intonation may occasionally obscure meaning. Able to sustain 
speech to complete the story. 

• Some limitations and errors in grammar and word choice are noticeable but meaning is rarely 
obscured. Use of connecting devices to link events may be limited. 

2 

A typical response at this level is characterized by the following: 

• Limited development of the story. Most events are recounted with little to no elaboration or 
detail. Some events or details may be difficult to follow for listeners not familiar with the 
pictures. Limited development and cohesion may cause listener to fill in the gaps. 

• May sustain speech throughout but pace may be slow, choppy, or hesitant throughout. Errors 
in pronunciation, stress, and intonation occasionally impact intelligibility and flow. 

• May struggle to convey the story due to limitations in grammar and word choice. May rely on 
mostly simple grammatical constructions and basic vocabulary. These limitations and errors 
may result in vague or unclear meaning at times. 

1 

A typical response at this level is characterized by the following: 

• Very limited development of story; may be incomplete. Story lacks detail or elaboration. 

• Generally unable to sustain speech throughout to complete a story. Frequent errors in 
pronunciation, stress, and intonation impact intelligibility. 

• Most utterances are characterized by errors. Vocabulary is limited and often inaccurate. 

0 

No attempt to respond OR No English in the response OR Response consists of a repetition of the 
prompt OR response is off topic 
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TOEFL Junior Speaking Scoring Guide 
Tasks 3 and 4: Listen-Speak 


TOEFL Junior® Design Framework 


Score Content, Delivery and Language Use Descriptors 


A typical response at this level is characterized by the following: 

• Content is full and appropriate to the task. Key information is conveyed coherently and accurately with 
some elaboration and detail although minor errors may occur. Connection among ideas is clear. 

• Speech is mostly clear and fluid with occasional imperfections. Minor errors of pronunciation, stress, and 
intonation do not interfere with understanding. Mispronunciation of key content words may occur but 
rarely obscure meaning. 

• Grammar and word choice are varied, appropriate to the task, and effectively convey meaning. 
Occasional errors of word form and grammar may occur but rarely obscure meaning. 


A typical response at this level is characterized by the following: 

• Content is mostly complete and appropriate to the task. Most key information is conveyed accurately but 
supporting details and elaboration are limited or lacking; minor inaccuracies or omissions may be evident. 
Response is fairly cohesive with minor lapses. 

• Response is mostly fluid and sustained with some lapses or imperfections evident. Pronunciation, stress, 
and intonation errors are noticeable but do not usually interfere with understanding. May struggle with 
pronunciation of unfamiliar content words. 

• Basic grammar and vocabulary are usually controlled although minor errors may occur. Awkward or 
inappropriate phrases may occur as the speaker attempts new constructions but these do not cause 
major misunderstandings. 


2 


A typical response at this level is characterized by the following: 

• Development is mostly limited to some (or all) of the main facts, presented one by one. Relies on the 
listener to make the connections between facts most of the time. Key information may be vaguely 
expressed or incomplete. Some misunderstanding of the talk may be evident. Some key information may 
be omitted or inaccurate. 

• Response may be fluid at times but speaker struggles to sustain. Response may be characterized by slow, 
choppy, or hesitant delivery. Errors in pronunciation, word stress, and intonation are evident and may 
interfere with understanding at times. 

• Lacks sufficient range and control of grammar and vocabulary to provide a concise summary of 
information. May rely on basic vocabulary to convey meaning. Linguistic errors are evident, may be 


systematic, and occasionally interfere with understanding. 

A typical response at this level is characterized by the following: 

• Content is incomplete and/or lacks development. Information conveyed is limited and may be vague or 
inaccurate. 

1 • Struggles to sustain speech to complete the task (or may sustain speech for only brief segments at a time, 

stopping and starting often) Pronunciation errors are evident but speaker may be understandable at 
times to the sympathetic listener. 

• May rely heavily on basic, high-frequency vocabulary or familiar, rehearsed phrases to convey content. 
Vocabulary is limited and often inaccurate. There may be little use of modifiers. Struggles to construct 
grammatical utterances beyond a few words. 

No attempt to respond OR No English in the response OR Response consists of a repetition of the prompt OR 
q response is off topic 
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Appendix G 


Scoring Rubrics for Writing Tasks 

TOEFL Junior Writing Scoring Guide 
Task 1: Edit 


Score 

Language Use 

4 

• Corrects all four errors 

3 

• Corrects three errors 

2 

• Corrects two errors 

1 

• Corrects one error 

0 

Attempts to correct all errors but does so incorrectly 

OR 

Makes no attempt to correct errors, only copies words from the stimulus, consists of only 
unrelated content, consists of keystroke characters, or is written in a foreign language 
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TOEFL Junior Writing Scoring Guide 
Task 2: E-mail 


TOEFL Junior® Design Framework 


Score 

Development and Language Use Descriptors 

4 

A typical response at this level is characterized by the following: 

• responds to all questions in the e-mail, directly or indirectly 

• is coherent 

• shows lexical variation appropriate for the task 

• displays a varied sentence structure appropriate for the task 

• may contain minor errors but they do not interfere with meaning 

3 

A typical response at this level is characterized by the following: 

• responds to most of the questions in the e-mail, directly or indirectly 

• is generally coherent 

• shows some lexical variation appropriate for the task 

• may display variation in sentence structure appropriate for the task 

• may contain some errors that occasionally interfere with meaning 

2 

A typical response at this level is characterized by the following: 

• responds to some questions in the e-mail 

• may be incoherent at times 

• shows little lexical variation (e.g., vocabulary is simple and repetitive), or often uses 
vocabulary incorrectly 

• may show little control of sentence structures 

• may contain errors that frequently interfere with meaning 

1 

A typical response at this level is characterized by the following: 

• responds minimally to questions in the e-mail 

• is generally incoherent 

• displays limited vocabulary that may be used incorrectly 

• uses mostly incorrect sentence structures 

• displays many errors that seriously interfere with meaning 

0 

Only copies words from the prompt, rejects the prompt, is completely off topic, consists of 
keystroke characters, is written in a foreign language, or is blank 
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TOEFL Junior Writing Scoring Guide 
Task 3: Opinion 


TOEFL Junior® Design Framework 


Score 

Development and Language Use Descriptors 

4 

A typical response at this level is characterized by the following: 

• states a position on the topic 

• provides support for the position, with specific details and/or examples 

• is mostly well organized and coherent 

• shows lexical variation appropriate for the task 

• displays a varied sentence structure appropriate for the task 

• may contain minor errors but they do not interfere with meaning or clarity 

3 

A typical response at this level is characterized by the following: 

• states a position on the topic 

• provides support for the stated position, but may have difficulty doing so fully 

• is generally well organized, with an occasional lapse of clarity when connecting ideas 

• shows some lexical variation appropriate for the task 

• may display some variation in sentence structure appropriate for the task 

• may contain some errors that occasionally interfere with meaning 

2 

A typical response at this level is characterized by the following: 

• states a position on the topic, but provides inadequate/incomplete support, 

OR 

• only vaguely implies a position on the topic, and provides inadequate/incomplete support 

• connections between ideas are attempted, but are sometimes unclear or missing 

• shows little lexical variation (e.g., vocabulary is simple and repetitive), or frequently uses 
vocabulary incorrectly 

• shows little variation in sentence structure (e.g., sentences are mostly simple and short), and 
shows little control of sentence structures 

• may contain errors that frequently interfere with meaning 

1 

A typical response at this level is characterized by the following: 

• states a position but provides incoherent or no support 

OR 

• does not state a position, or makes only a minimal connection to the prompt and provides 
minimal or no support 

• is generally unorganized and incoherent 

• displays extremely limited vocabulary that is frequently used incorrectly 

• uses mostly incorrect sentence structures 

• displays many errors that seriously interfere with meaning 

0 

Only copies words from the prompt, rejects the prompt, is completely off topic, consists of 
keystroke characters, is written in a foreign language, or is blank 
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TOEFL Junior Writing Scoring Guide 
Task 4: Listen-Write 


TOEFL Junior® Design Framework 


Score 

Development and Language Use 

4 

A typical response at this level is characterized by the following: 

• accurately provides all key points 

• provides support using relevant details from the talk 

• is mostly well organized and coherent 

• shows lexical variation appropriate for the task 

• displays a varied sentence structure appropriate for the task 

• may contain errors but they do not interfere with meaning or clarity 

3 

A typical response at this level is characterized by the following: 

• accurately provides most key points 

• provides some supporting details from the talk 

• is generally organized, with an occasional lapse of clarity when connecting ideas 

• shows some lexical variation appropriate for the task 

• may display some varied sentence structure appropriate for the task 

• may contain errors that occasionally interfere with meaning 

2 

A typical response at this level is characterized by the following: 

• provides some accurate content from the key points 

• provides minimal or no supporting details from the talk 

• connections between ideas are attempted but are often unclear or missing 

• shows little lexical variation (e.g., vocabulary is simple and repetitive), Q£ frequently uses vocabulary 
incorrectly 

• shows little variation in sentence structure (e.g., sentences are mostly simple and short), or shows 
little control of sentence structures 

• may contain errors that frequently interfere with meaning 

1 

A typical response at this level is characterized by the following: 

• provides minimal or no content from the key points 

• does not provide details beyond those shown in the visuals 

• provides incoherent or no support for any of the points 

• is generally unorganized and incoherent 

• displays extremely limited vocabulary that is frequently used incorrectly 

• uses mostly incorrect sentence structures 

• displays many errors that seriously interfere with meaning 

0 

Only copies words from the prompt, rejects the prompt, is completely off topic, consists of keystroke 
characters, is written in a foreign language, or is blank 
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